Adaptive Representation Selection in Contextual Bandit with Unlabeled History

نویسندگان

  • Baihan Lin
  • Guillermo A. Cecchi
  • Djallel Bouneffouf
  • Irina Rish
چکیده

We consider an extension of the contextual bandit setting, motivated by several practical applications, where an unlabeled history of contexts can become available for pre-training before the online decisionmaking begins. We propose an approach for improving the performance of contextual bandit in such setting, via adaptive, dynamic representation learning, which combines offline pre-training on unlabeled history of contexts with online selection and modification of embedding functions. Our experiments on a variety of datasets and in different nonstationary environments demonstrate clear advantages of our approach over the standard contextual bandit.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Customized Nonlinear Bandits for Online Response Selection in Neural Conversation Models

Dialog response selection is an important step towards natural response generation in conversational agents. Existing work on neural conversational models mainly focuses on offline supervised learning using a large set of context-response pairs. In this paper, we focus on online learning of response selection in retrieval-based dialog systems. We propose a contextual multi-armed bandit model wi...

متن کامل

Switching between Representations in Reinforcement Learning

This chapter presents and evaluates an on-line representation selection method for factored MDPs. The method addresses a special case of the feature selection problem that only considers certain sub-sets of features, which we call candidate representations. A motivation for the method is that it can potentially deal with problems where other structure learning algorithms are infeasible due to a...

متن کامل

BISTRO: An Efficient Relaxation-Based Method for Contextual Bandits

We present efficient algorithms for the problem of contextual bandits with i.i.d. covariates, an arbitrary sequence of rewards, and an arbitrary class of policies. Our algorithm BISTRO requires d calls to the empirical risk minimization (ERM) oracle per round, where d is the number of actions. The method uses unlabeled data to make the problem computationally simple. When the ERM problem itself...

متن کامل

Active contextual policy search

We consider the problem of learning skills that are versatilely applicable. One popular approach for learning such skills is contextual policy search in which the individual tasks are represented as context vectors. We are interested in settings in which the agent is able to actively select the tasks that it examines during the learning process. We argue that there is a better way than selectin...

متن کامل

An Asymptotically Optimal Contextual Bandit Algorithm Using Hierarchical Structures

We investigate the contextual multi-armed bandit problem in an adversarial setting and introduce an online algorithm that asymptotically achieves the performance of the best contextual bandit arm selection strategy under certain conditions. We show that our algorithm is highly efficient and provides significantly improved performance with a guaranteed performance upper bound in a strong mathema...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1802.00981  شماره 

صفحات  -

تاریخ انتشار 2018